Skip to content

Add measured site difficulty index to FAQ#358

Open
ulziibay-kernel wants to merge 4 commits into
mainfrom
hypeship/site-difficulty-index
Open

Add measured site difficulty index to FAQ#358
ulziibay-kernel wants to merge 4 commits into
mainfrom
hypeship/site-difficulty-index

Conversation

@ulziibay-kernel
Copy link
Copy Markdown
Contributor

@ulziibay-kernel ulziibay-kernel commented May 11, 2026

Summary

Replaces the Unsupported Websites section in browsers/faq.mdx with a measured site difficulty index. Each site's block + challenge rate comes from actually running stealth Kernel browsers against it, not from practitioner lore.

Methodology

For each of 31 sites:

  • N=5 concurrent stealth browser sessions
  • US residential proxy (different exit IP per session)
  • Navigate to the public landing URL only — no login, no deep navigation
  • Classify each session as success / challenged / blocked using vendor signatures (Cloudflare, DataDome, PerimeterX, Imperva, Akamai, Kasada, reCAPTCHA, hCaptcha)

Headline results

Group Sites
Hard (≥40% block rate) Yelp 100% (DataDome), Glassdoor 100% (Cloudflare), Indeed 40% (Cloudflare + Imperva), TripAdvisor 40% (DataDome)
Light (1–39% block rate) Yellow Pages 20% (Cloudflare), Zillow 20% (PerimeterX)
Clear (0% block rate) LinkedIn, Facebook, Instagram, TikTok, X, Reddit, Amazon, Booking.com, Airbnb, Walmart, Google Search, Google Maps, YouTube, Pinterest, Target, Crunchbase, eBay, Etsy, Medium, IMDb, Cars.com, Gymshark (Shopify), GitHub, Yahoo Finance, Wikipedia, Facebook Marketplace

Important caveats (called out in the doc)

This is a floor, not a ceiling. A site that scores 0% on an anonymous homepage visit can still be very hard once you add login, repeated requests from the same IP, deep navigation, or large concurrency. The doc says so explicitly and flags login-flow / at-scale benchmarks as future work.

Test plan

  • Mintlify preview renders the three tables + methodology block cleanly
  • #site-difficulty-index and #methodology anchors resolve
  • Cross-link from bot-detection/overview.mdx is unaffected by this change (none added in this PR)

Note

Low Risk
Low risk: documentation-only changes that don’t affect runtime behavior; main risk is minor confusion if the difficulty categorization becomes outdated.

Overview
Replaces the FAQ’s Unsupported Websites section with a Site difficulty index that groups sites by observed bot-detection friction on the public landing page (Hard/Light/Clear).

Adds expanded site lists under each category and an Info callout linking to proxy/headless guidance for cases where “Clear” sites still trigger detection.

Reviewed by Cursor Bugbot for commit c55e4a6. Bugbot is set up for automated code reviews on this repo. Configure here.

Replaces flat unsupported-websites list with a five-tier index covering very-hard through very-easy targets, with framing on how to interpret the tiers and a pointer to manual baselining.
@mintlify
Copy link
Copy Markdown
Contributor

mintlify Bot commented May 11, 2026

Preview deployment for your docs. Learn more about Mintlify Previews.

Project Status Preview Updated (UTC)
Kernel 🟢 Ready View Preview May 11, 2026, 5:29 PM

💡 Tip: Enable Workflows to automatically generate PRs for you.

Reframes the site difficulty index around an N=5 stealth + US residential proxy test against each site's public homepage. Three groups (Hard / Light / Clear) ranked by observed block + challenge rate, with detection vendor noted per site. Adds a methodology section and explicit caveats that this is a floor, not a ceiling — login flows and at-scale behavior are out of scope for this benchmark.
@ulziibay-kernel ulziibay-kernel changed the title Add tiered site difficulty index to FAQ Add measured site difficulty index to FAQ May 11, 2026
cursor[bot]
cursor Bot approved these changes May 11, 2026
Keeps Hard / Light / Clear grouping but drops the percentage tables in favor of plain lists. The methodology section still describes how sites get bucketed.
Copy link
Copy Markdown

@cursor cursor Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Stale comment

Risk assessment: Very Low

Evidence from the current diff: this PR only changes browsers/faq.mdx, replacing an unsupported-websites FAQ section with a site difficulty index, methodology text, and documentation callouts. There are no runtime codepaths, shared libraries, configuration, infrastructure, auth, billing, permissions, or schema changes.

I also checked the update since the prior approval: it only changes the same documentation section from detailed tables/inline lists to short explanatory copy and bullet lists. That does not increase the risk level, so I’m not dismissing the existing approval. Because the PR is already approved, I’m leaving this as a comment rather than submitting another approval.

Open in Web View Automation 

Sent by Cursor Automation: Assign PR reviewers

Copy link
Copy Markdown

@cursor cursor Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Stale comment

Risk assessment: Very Low

I assessed the current diff directly. It only modifies browsers/faq.mdx, replacing the unsupported-websites FAQ copy with a site difficulty index, short explanatory text, bullet lists, and an existing-style Mintlify Info callout. There are no runtime codepaths, shared libraries, config/navigation changes, infrastructure changes, auth/billing/permissions changes, or schema/data changes.

Codeowners review does not appear to be required because this repo has no CODEOWNERS file. GitHub already shows the PR as approved, so I’m leaving this assessment as a comment rather than submitting another approval. The updates after the prior approval are still confined to this documentation section and do not increase risk.

Open in Web View Automation 

Sent by Cursor Automation: Assign PR reviewers

@ulziibay-kernel ulziibay-kernel marked this pull request as ready for review May 11, 2026 17:57
@firetiger-agent
Copy link
Copy Markdown

Firetiger deploy monitoring skipped

This PR didn't match the auto-monitor filter configured on your GitHub connection:

Any PR that changes the kernel API. Monitor changes to API endpoints (packages/api/cmd/api/) and Temporal workflows (packages/api/lib/temporal) in the kernel repo

Reason: This PR only modifies documentation (FAQ and site difficulty index in browsers/faq.mdx) and does not change any kernel API endpoints or Temporal workflows.

To monitor this PR anyway, reply with @firetiger monitor this.

Copy link
Copy Markdown

@cursor cursor Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Risk assessment: Very Low

I assessed the current diff directly. The PR only modifies browsers/faq.mdx, replacing the unsupported-websites FAQ section with a site difficulty index, explanatory copy, site lists, and an existing Mintlify Info callout pattern. There are no runtime codepaths, shared libraries, configuration/navigation changes, infrastructure changes, auth/billing/permissions changes, or schema/data changes.

Codeowners review does not appear to be required because this repo has no CODEOWNERS file. GitHub already reports the PR as approved, including an approval before the latest commit; this update does not increase risk, so I’m leaving the existing approval in place and not submitting another approval.

Open in Web View Automation 

Sent by Cursor Automation: Assign PR reviewers

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant